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ABSTRACT Related species are often used to understand the molecular underpinning of virulence through examination of a 
shared set of biological features attributable to a core genome of orthologous genes. An important but insufficiently studied is- 
sue, however, is the extent to which the regulatory architectures are similarly conserved. A small number of studies have com- 
pared the primary transcriptomes of different bacterial species, but few have compared closely related species with clearly diver- 
gent evolutionary histories. We addressed the impact of differing modes of evolution within the genus Mycobacterium through 
comparison of the primary transcriptome of M. marinum with that of a closely related lineage, M. bovis. Both are thought to 
have evolved from an ancestral generalist species, with M. bovis and other members of the M. tuberculosis complex having sub- 
sequently undergone downsizing of their genomes during the transition to obligate pathogenicity. M. marinum, in contrast, has 
retained a large genome, appropriate for an environmental organism, and is a broad-host-range pathogen. We also examined 
changes over a shorter evolutionary time period through comparison of the primary transcriptome of M. bovis with that of an- 
other member of the M. tuberculosis complex (M. tuberculosis) which possesses an almost identical genome but maintains a dis- 
tinct host preference. 

IMPORTANCE Our comparison of the transcriptional start site (TSS) maps of M. marinum and M. bovis uncovers a pillar of con- 
served promoters, noncoding RNA (NCRNA), and a genome-wide signal in the —35 promoter regions of both species. We iden- 
tify evolutionarily conserved transcriptional attenuation and highlight its potential contribution to multidrug resistance medi- 
ated through the transcriptional regulator whiB7. We show that a species population history is reflected in its transcriptome and 
posit relaxed selection as the main driver of an abundance of canonical — 10 promoter sites in M. bovis relative to M. marinum. 
It appears that transcriptome composition in mycobacteria is driven primarily by the availability of such sites and that their fre- 
quencies diverge significantly across the mycobacterial dade. Finally, through comparison of M. bovis and M. tuberculosis, we 
illustrate that single nucleotide polymorphism (SNP)-driven promoter differences likely underpin many of the transcriptional 
differences between M. tuberculosis complex lineages. 
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The genus Mycobacterium enumerates more than 100 species, 
among which are pathogens of global importance to both hu- 
mans and livestock (1), including the causative agent of tubercu- 
losis, M. tuberculosis. The genome of M. tuberculosis has been se- 
quenced for over a decade, and yet many important issues relating 
to the biology of mycobacteria remain poorly understood. Despite 
the application of comparative genomics, there is a dearth of stud- 
ies illustrating how evolutionary forces shape transcriptomes 
within this genus. M. bovis, the causative agent of bovine tubercu- 
losis, contains 3,952 predicted protein-coding genes and shares 
>99.95% sequence identity with M. tuberculosis (2). M. marinum 
is a pathogen of amphibians and one of the species most closely 
related to the M. tuberculosis complex, sharing almost 3,000 or- 
thologs with M. bovis and M. tuberculosis (ca. 75% of their total 
coding regions) (3). This core set of orthologs, with average amino 
acid identity of 85%, has underpinned the use of M. marinum as a 



model organism to study mycobacterial virulence and the evolu- 
tion of the M. tuberculosis complex (3). The 6-Mb genome of 
M. marinum is appreciably larger than those of the members of the 
M. tuberculosis complex, which are thought to have undergone a 
process of reductive evolution as a consequence of niche adapta- 
tion to the mammalian host cell environment (4). Though their 
genomes lack the obvious signatures of genome degradation 
found in other obligate pathogens, members of the M. tuberculosis 
complex are characterized by low sequence diversity and clonal 
evolution (5, 6). 

TSS mapping. Differential RNA sequencing (dRNA-seq) en- 
riching for primary transcripts has emerged as a highly accurate 
means of mapping the primary transcriptomes of a number of 
species (7, 8). Transcriptional start site (TSS) mapping through 
delineation of the regions upstream of the translation start points 
(TSPs) facilitates the identification of promoters and 5' untrans- 
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FIG 1 (A) Venn diagrams outlining the distribution of experimentally identified TSSs in M. marinum and M. bovis species. (B) Classification scheme used for 
TSSs associated with orthologous genes. Data represent the distribution of pTSSs associated with orthologous genes in each category for M. marinum and 
M. bovis. (C) pLogo depiction of base overrepresentation at nucleotide positions relative to the TSSs among conserved (CPT) primary promoters. The red 
horizontal lines indicate statistical significance, with Bonferroni correction. It can be seen that T residues at positions —35 and —36 are statistically enriched 
within these promoters relative to randomly selected genomic regions. 



lated region (5' UTR)-associated regulatory elements across an 
entire genome. The discovery of copious numbers of TSSs not 
obviously associated with mRNA transcripts in a number of stud- 
ies has led to an explosion in the numbers of predicted noncoding 
RNA (NCRNA) species. The very presence, however, of such large 
numbers of TSSs presents a significant challenge in identifying 
NCRNAs that are likely to act as regulatory elements, as only a 
small subset have been assigned functional roles. Conservation 
across conditions or between related species has been used as a 
surrogate for functionality in some studies (9) to account for the 
difficulty in interpreting the potentially confounding additive ef- 
fects of evolutionary forces. However, when the primary evolu- 
tionary forces acting on species are expected to be similar, their 
effects on the transcriptome are difficult to clearly identify. Given 
that relaxed selection is thought to be a significant force in the M. 
tuberculosis complex lineage ( 10), we compared the primary tran- 
scriptome of M. marinum with that of a member of the M. tuber- 
culosis complex (M. bovis). These organisms share a highly con- 
served core of orthologous genes that have experienced evolution 
in appreciably different ways, which should enable a comparison 
of the effects of reduced selection on transcriptional complexity. 

RESULTS 

Comparison of M. marinum and M. bovis reveals a core set of 
conserved promoters. Whole-transcriptome sequencing was per- 
formed on two independent biological replicates of exponentially 
growing M. bovis AF2 122/97 and M. marinum M. The pairwise 
correlation coefficients for these replicates demonstrated a high 
degree of reproducibility between runs (see Table SI in the sup- 
plemental material). dRNA-seq libraries yielded -12 to 13 million 
reads per replicate for both species, of which -40% to 60% 
mapped to the reference genomes following removal of rRNA 
reads. Alignment of the terminator exonuclease (TEX)-positive 
(TEX + ) and TEX - dRNA-seq libraries to the genomes of M. ma- 
rinum and M. bovis resulted in the identification of 6,331 and 
6,500 individual TSSs, respectively (see Table SI). Genome-wide 
TSS maps were generated by categorizing TSSs depending on their 



position relative to annotated genes, in a manner similar to that 
described by Sharma et al. (7) (see Fig. SI in the supplemental 
material). Specifically, all TSSs were assigned to at least one of five 
categories: primary TSS (pTSS), alternative TSS (altTSS), internal 
TSS (iTSS), antisense TSS (asTSS), and intergenic TSS (igTSS) 
(Fig. 1A). A full breakdown of the motifs found within the pro- 
moter regions of both species is given in Fig. S2. 

A total set of 2,939 orthologous genes shared between M. ma- 
rinum and M. bovis was identified for comparative analyses (see 
Table SI in the supplemental material). A total of 1,730 of 2,875 
(60%) pTSSs detected in M. marinum and 1,876 of 2,367 (79%) 
detected in M. bovis were assigned to genes classified as ortholo- 
gous. This indicates that our analysis of regulatory regions is rel- 
evant for the majority of the shared transcriptomes under the 
conditions tested. To detect the presence of shared regulatory fea- 
tures, we analyzed the upstream regions of orthologous genes in 
both species, employing a strategy derived from a recent study 
comparing the transcriptomes of Escherichia coli and Klebsiella 
pneumoniae (11) (Fig. IB). 

Specifically, having defined a promoter as the 50-nucleotide 
(nt) region upstream of a TSS, we categorized all TSSs into one of 
four groups based on conservation of the promoter sequence and 
the presence or absence of a TSS in the other species. Category one 
(conserved promoter with TSS [CPT]) consisted of a conserved 
promoter region with a TSS called in both species. Category two 
(conserved promoter with no matching TSS [CPNT] ) consisted of 
a conserved promoter with a TSS present in only one of the spe- 
cies. Category three, consisting of the orphan promoter (OP), 
showed no promoter conservation and a TSS in only one species, 
while category four promoters, the species-specific promoters 
(SSP), were promoters restricted to species-specific genes. 

A total of 9 1 4 primary TSSs were classified as CPT (see Table S2 
in the supplemental material) sharing conserved promoter re- 
gions, thus allowing a direct comparison of their associated regu- 
latory regions. The 5' UTR lengths of these genes, as defined by 
their primary TSS, were positively correlated (Pearson r = 0.55, P 
<2.2e-16), with 577 (63%) having lengths within 10 nt of each 
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other (see Table S3). Utilizing data from a recent study by Dejesus 
et al. (12) that reassigned TSPs in M. tuberculosis, we found that 
the 5' UTR length proved an accurate predictor of the direction 
and scale of corresponding reannotations for 66 of 80 (83%) of 
M. bovis orthologs within the CPT data set (see Table S3). 

MIRUs. Where length differences were observed between the 
5' UTRs of orthologous genes, a subset could be attributed to the 
presence of mycobacterial interspersed repetitive units (MIRUs) 
in M. bovis 5' UTRs (13). MIRU-based variable-number tandem- 
repeat (VNTR) molecular typing is a widely used tool for geno- 
typing of clinical isolates of M. tuberculosis, and MIRU elements 
are hypothesized to be transcribed and to contribute to antigenic 
variability (14, 15). Additionally, studies have shown that VNTR 
copy number variants are capable of altering promoter activity in 
neighboring genes (15, 16). The extent of MIRU influence on 
transcriptional control has not been extensively studied; however, 
of the 24 MIRU loci used for VNTR typing, 17 overlap or are 
contained within M. bovis 5' UTRs (see Table S3 in the supple- 
mental material). For example, gene MMAR_1344, encoding the 
chromosome segregation protein ParA in M. marinum, was found 
to be transcribed in a leaderless fashion, with exactly overlapping 
TSS and TSP positions. However, the orthologous gene in M. bovis 
(M. bovis Mb3239c) harbors a 77-nt 5' UTR consisting entirely of 
a class I MIRU. Notably, several regulatory genes carry MIRUs 
within their 5' UTRs (see Table S3), suggesting that MIRUs may 
exert a wider transcriptional influence than has previously been 
appreciated. 

Identification of a conserved signal at the —35 region of my- 
cobacterial promoters. The consensus — 10 hexamer motif (TAN 
NNT), which is recognized by the sigma A (SigA) principle cr 
factor in mycobacteria, was the dominant transcriptional regula- 
tory motif identified in both species (Fig. 1C), being found up- 
stream of 50.3% of TSSs detected in M. marinum and 62.8% of 
TSSs detected in M. bovis. The extended —10 class of promoter 
(TGNTANNNT) was identified upstream of 2.7% of TSSs in 
M. marinum and 4.4% of TSSs in M. bovis. Similarly to a recently 
published data set from M. tuberculosis (17), promoters contain- 
ing extended — 10 motifs were found to be associated with signif- 
icantly larger TSS peak heights in both species (P for both, <2.2e- 
16, Kruskal-Wallis test) (see Fig. S3 in the supplemental material). 
Among promoters lacking a — 10 box, two alternative motifs were 
detected (see Fig. S2), and both were associated with lower TSS 
peak heights than the consensus —10 box (P <2.2e-16 in both 
species, Kruskal-Wallis test). The more prominent of the two al- 
ternative motifs harbors a consensus G residue at position —14 
relative to the TSS [(— 14)GNNANNNT( — 7)]. Studies in E. coli 
suggest it is likely that a G residue at this position can compensate 
for the lack of a T residue at its canonical — 12 position (18). 

No single —35 hexamer has been identified across mycobacte- 
ria, and a recent global study of M. tuberculosis promoters was 
unable to identify a consensus motif at the site (17). However, the 
DNA sequence at this region can influence the activity of myco- 
bacterial promoters by modulating recognition by RNA polymer- 
ase (RNAP), and mutation or inversion of the —35 area can have 
an impact on transcription. In order to search for a conserved — 35 
signal, we examined the promoters of the shared (CPT) primary 
TSSs from both species using the motif visualization package 
pLogo (19), which allows an assessment of the statistical signifi- 
cance of sequence features compared to a background data set (see 
Materials and Methods). pLogo identified the existence of thy- 



mine (T) residues occurring above the level of statistical signifi- 
cance (P < 0.05, with Bonferroni correction) at positions — 35 and 
-36 in both species (Fig. 1C). Within the 914 CPT TSSs, 162 
(18%) were found to share a conserved TT motif within the —35 
region in both species (see Table S3 in the supplemental material). 
A number of the genes in this subset, including those encoding 
translation initiation factors (infA and infC), lipoproteins (IpqD 
and IpqF), and the WhiB family transcriptional regulators (whiBl 
and whiB3), are functionally related. 

Transcriptional attenuation within the 5 ' UTRs of shared or- 
thologs. The 5' ends of mRNA molecules are important as-acting 
regulatory elements, which can mediate transcriptional attenua- 
tion in response to a variety of stimuli (20). These complex RNA 
structures sometimes harbor low-level primary-sequence conser- 
vation, and only a small proportion are likely to have been iden- 
tified and annotated (20). Regulatory elements within 5' UTRs 
may be revealed by the premature attenuation of transcription 
upstream of their associated genes, as indicated by high ratios of 5' 
UTR-to-coding sequence (CDS) transcription, normalized to 
reads per kilobase per million mapped reads (RPKM) (21, 22). 

We identified a total of 291 genes with long (>30-nt) 5' UTRs 
and high 5' UTR-to-CDS RPKM ratios in M. marinum and 208 
such genes in M. bovis, 56 of which were shared by the two species 
(see Table S3 in the supplemental material). The 5' UTRs of genes 
within this subset had significantly higher sequence identities 
(median identity = 53%) than those of other orthologous genes 
(median identity = 41%) (P < 0.005, Wilcoxon Mann- Whitney 
test). Small regulatory RNAs (sRNAs) have been predicted in 7 of 
these 5' UTRs in M. tuberculosis (23), while an additional 6 are 
predicted to contain structured RNAs with homologs in the Rfam 
database of RNA families. The evolutionary conservation of high 
5' UTR-to-CDS RPKM ratios provides support for the idea of the 
participation of 5' UTRs in transcriptional attenuation (20). 

For example, the gene whiB7, which is upregulated in response 
to a variety of antibiotics and regulates the expression of a number 
of proteins involved in intrinsic antibiotic resistance (24), was 
found to have prominent 5' UTR expression relative to that of its 
CDS in both species (Fig. 4) . Within the 5 ' UTRs of whiB7 in both 
species, we identified a small, unannotated upstream open reading 
frame (uORF) located -10 nt downstream of a consensus Shine- 
Dalgarno (SD) motif. Transcription is seen to spike at the 5' of the 
uORF and attenuates rapidly 3' of the uORF, consistent with the 
presence of a conserved stem-loop structure with a T-rich tail, 
which may function as a rho-independent terminator (RIT) 
(Fig. 4). Previous studies have identified this ORF upstream of 
whiB7 in M. tuberculosis and have shown that whiB7 is transcrip- 
tionally coupled to it (25). Our data showing that the uORF and 
whiB7 have been cotranscribed across evolutionary time provide a 
strong case for its involvement in the regulation of whiB7 expres- 
sion. 

The expression levels of many genes are controlled by uORFs, 
which can respond to physiological signals (26). In the case of the 
erm family of macrolide resistance genes, antibiotic-promoted ri- 
bosome stalling and frameshifting in the uORF can independently 
activate antibiotic resistance regulons (27-29). A notable excep- 
tion to this is the erm(37) of M. tuberculosis, which is part of the 
whiB7 regulon (24), and macrolide resistance in mycobacteria re- 
sults from the induction of whiB7 (30). An intriguing possibility 
raised by our analysis is that antibiotic resistance in mycobacteria 
occurs via uORF-mediated transcriptional attenuation. Muta- 
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FIG 2 (A) Pie chart distribution of consensus — 10 box motifs detected within the promoters across each TSS class between M. marinum and M. bovis. (B) 
Comparison of SigA consensus — 10 box motif densities along the genomes of both species. Forward strand genes are indicated in purple, and reverse strand genes 
are in green. 



tions in the 5' UTR of whiB7, which can mediate antibiotic resis- 
tance in M. tuberculosis (31), may therefore promote translational 
readthrough of the uORF and result in activation of the whiB7 
regulon. 

Other genes found to have high 5' UTR-to-CDS RPKM ratios 
in both species included the M. marinum (MMAR_0811) and 
M. bovis (Mb0495) orthologs of the M. tuberculosis transcriptional 
regulator Rv0485, which regulates the PE13 and PPE18 gene pair 
and is essential for virulence (32). Similarly, the gene encoding the 
pyruvate dehydrogenase El subunit A {pdliA) component con- 
tains a highly expressed 5 ' UTR relative to the CDS in both species, 
in agreement with a previous report in Bacillus subtilis indicating 
that the leader region may participate in regulation (33). 

Pervasive transcription initiation drives divergent distribu- 
tion of TSSs between M. marinum and M. bovis. A striking fea- 
ture of the TSS maps of M. marinum and M. bovis is the divergence 
in TSS distribution, with M. bovis manifesting a much larger pro- 
portion of its assigned TSSs in all categories except the primary 
TSS (Fig. 1A). Additionally, in marked contrast to the relatively 
high conservation of pTSSs, other categories displayed an almost 
complete lack of conservation, with just 50 iTSSs and 49 asTSSs in 
total shared between the species, based on the best reciprocal 
alignment of the promoter sequences (see Table S2 in the supple- 
mental material). A comparison of the contributing motifs across 
the different TSS classes reveals that the elevated distribution of 
nonprimary TSSs in M. bovis is driven by an expanded represen- 
tation of — 10 box consensus promoters in these classes relative to 
M. marinum (Fig. 2A). 

We looked to determine whether this overrepresentation of 

— 10 box promoters was driven by underlying differences in the 
sequence compositions of the genomes of the two species. Al- 
though the GC content of M. marinum (65.7%) is almost identical 
to that of M. bovis (65.6%), the latter has 15,569 consensus —10 
box motifs (3,674 per Mb on average), a significantly higher den- 
sity than that of M. marinum, with an average of 2,556 per Mb (P 
<2.2e-16, Pearson's chi-square test) (Fig. 2B). The number of 

— 10 box motifs places a theoretical upper limit on the level of 
SigA-driven transcription initiation and predicts that such initia- 
tion is theoretically possible at over 1,000 additional locations per 
Mb in M. bovis compared to M. marinum. 

To explore the extent to which a genome's density of —10 



boxes correlates with promoter occupancy, we determined the 
number of consensus — 10 sites in M. bovis being filled by at least 
10 TEX + reads, a number which is lower than the cutoff of 20 
reads used for actual TSS calls, while maintaining a minimum 
TEX + /TEX~ ratio of at least 2:1. Applying this relaxed criterion, 
designed to mimic the effect of increased sequencing depth, re- 
sulted in 7,770 (50%) of the total 15,569 consensus — 10 box sites 
being occupied at the fully mapped library size of -6.6 million 
TEX + reads (excluding rRNA). We also determined the degree to 
which —10 site occupancy by the accumulation of TEX 4 " reads 
contributes to TSS identification by mapping the relationship 
across different library sizes (Fig. 3A). The rate at which —10 
boxes were found to become occupied mirrors almost exactly the 
call rate for TSSs, indicating that the identified TSSs and their 
distributions are largely functions of the presence of an appropri- 
ate initiation site combined with an appropriate sequencing 
depth. In contrast, equivalent numbers of randomly selected 
genomic positions show a significantly lower level of occupancy at 
each depth (P <2.2e-16, Pearson's chi-square test). 

These results indicate that the various densities of TANNNT 
motifs can contribute to large-scale divergences in primary tran- 
scriptome composition, even between closely related mycobacte- 
rial species. As the mutational bias is toward AT in bacteria (34), a 
parsimonious interpretation of the enlarged noncoding transcrip- 
tome of M. bovis relative to M. marinum is that it is driven by an 
accumulation of A+T-enriching single nucleotide polymor- 
phisms (SNPs) ineffectually purged by natural selection (35). As 
an extreme example, the genome of M. leprae, which has ~8% 
lower GC content than other mycobacteria and has experienced 
extensive genomic degradation (36), contains a very high density 
of consensus sites (12,728 per Mb, on average). 

Limited conservation of asTSSs between M. marinum and 
M. bovis belies a thematic conservation of antisense transcrip- 
tion. Sequencing in M. tuberculosis has uncovered an abundance 
of asRNA, generally originating antisense to a coding region or 
from the 3' UTR of a gene facing the opposite direction (21). The 
presence of pervasive intragenic transcription initiation in M. bo- 
vis relative to M. marinum is reflected in a very low level of con- 
servation between the species, in line with data from other studies 
(9). However, in both species, the genes in the subset with high 
antisense-to-sense RPKM ratios (>0.75) were enriched for inser- 



4 mBio' mbio.asm.org 



July/August 2014 Volume 5 Issue 4 e01169-14 



Relaxed Selection Drives Noisy Noncoding Transcriptome 



in 

4— 

o 

<u 
n 

£ 

3 



B 



I I TANNNT sites occupied 
LZ] Random siles occupied 
— overall TSS calls 




£ 




Sense Antisense 
M. bovis M. marinum 



Library size (million reads mapped) 



Position relative to start codon (nt) 



FIG 3 (A) Line plots show the number of — 10 box motifs showing RNAP occupancy at the indicated levels of sequence depth in M. bovis compared with 
randomly selected genomic sites. The dashed line shows the total number of actual TSS calls made with the available sequence data at each depth. (B) Expansion 
of TSSs within the upstream regions of orthologous genes in M. bovis. A histogram shows the density of TSSs upstream of orthologs with conserved TSSs (CPTs) 
in both species. The inset shows the distribution of nonprimary TSSs located within 500 nt of the TSP in both species. **,P < 0.005, Pearson's chi-squared test. 



tion sequences and phages (P < 0.05, Pearson's chi-square test), 
and this antisense transcription may serve to repress transposi- 
tion. A role for asRNA in genome maintenance has been suggested 
in a previous study in the rapidly growing species M. smegmatis 
(37), indicating that this likely represents a conserved function of 
asRNA in mycobacteria. TSS-driven asRNA was identified oppo- 
site the Casl and Cas2 genes of the clustered regularly interspaced 
short palindromic repeat (CRISPR) locus in M. bovis, analogous 
to that reported for M. tuberculosis (21), suggesting another po- 
tential role for antisense transcription in genome defense. 

5' UTRs and conservation of alternative TSSs. Our results 
indicate that each species has a degree of transcriptional "baggage" 
and predict that this phenomenon should affect all nonprimary 
TSS classes. pTSSs are assigned by peak height and proximity to a 
gene (£500 nt), with altTSSs residing within the same region and 
having lower peak heights. To compare the TSS compositions of 
this region between the species, we analyzed the set of 914 or- 
thologs with conserved pTSS (CPT). Overall, we found that M. bo- 
vis was more likely to have multiple TSSs upstream of these genes 
thanM. marinum (P < 0.005, Pearson's chi-square test) (Fig. 3B). 
Additionally, M. bovis was more likely to contain at least one 
nonprimary antisense TSS (i.e., an antisense TSS not associated 
with a divergently transcribed gene) within this region (P < 0.005, 
Pearson's chi-square test) (Fig. 3B, bottom right). The increased 
transcriptional initiation occurring on both strands proximal to 
the pTSSs of CPT orthologs may reflect increased transcriptional 
complexity in M. bovis or, as seems more likely, may result from 
the influence of relaxed selection and genetic drift on the tran- 
scriptome of M. bovis. 

Intergenic sRNAs. The genome-wide identification of TSSs 
has aided in the discovery and prediction of intergenic small reg- 
ulatory RNAs (sRNAs), which can act in trans on their regulatory 
targets (38). These frans-encoded RNAs generally exist as short 
(~50-to-300-bp) transcripts capable of acting locally or globally 
and resulting in the formation of secondary structures that can 
affect mRNA stability or occlude ribosome binding (38). A num- 
ber of studies in different mycobacteria have reported the identi- 
fication of intergenic sRNAs (21, 23, 39-41). To determine the 



degree of overlap of our TSS maps, we assembled a list of 42 ex- 
perimentally validated intergenic sRNAs from mycobacterial spe- 
cies (18fromM. bovis BCG [39,41], 16 from M. tuberculosis [21, 
23], 7 from M. smegmatis [37], and 1 from M. avium [40]) and 
mapped their locations by sequence homology to the genomes of 
both species (see Table S4 in the supplemental material). 

In total, of the 42 sRNAs, 22 could be mapped unambiguously 
to locations within the M. marinum genome, while 41 could be 
mapped to M. bovis. Of these, 14 (64%) were assigned TSSs in the 
former in our study, while 21 (51%) were assigned TSSs in the 
latter. A total of 1 1 TSS-associated sRNAs were common to the 
two species, representing a core set shared between the lineages 
(see Table S4 in the supplemental material). Notably, 9 TSS- 
associated sRNAs (41%) in M. marinum and 12 (29%) in M. bovis 
were identified within regions annotated as 5' UTRs, indicating 
the challenge of distinguishing 5' regulatory elements from bona 
fide intergenic transcripts. 

Comparison of TSS maps identifies SNP-driven promoter 
differences between M. bovis and M. tuberculosis. The transcrip- 
tional landscape of M. tuberculosis during exponential in vitro 
growth has recently been elucidated by Cortes et al. (17). M. bovis 
differs from M. tuberculosis in its host preference but possesses an 
almost identical genome and contains no unique genes. Given that 
differential gene expression may contribute to the molecular un- 
derpinnings of their differing phenotypes, we sought to determine 
the extent of overlap of the primary transcriptomes of M. bovis and 
M. tuberculosis. As the overall TSS numbers called for M. bovis 
were higher, we benchmarked the degree of conservation against 
TSSs identified in M. tuberculosis (Fig. 5). We found overall that a 
majority of the pTSSs, asTSSs, and iTSSs of the two data sets over- 
lap (Fig. 5), while the level of overlap of altTSSs was lower. 

It is likely that many of the TSSs detected only in M. bovis are 
also functional in M. tuberculosis but have not been annotated, 
perhaps due to the sequencing depth employed. Indeed, inspec- 
tion of the TSS-seq data generated for M. tuberculosis (17) shows a 
significantly greater pileup of TEX + reads at such positions than at 
randomly selected genomic sites (P <2.2e-16, Pearson's chi- 
square test) (see Fig. S4 in the supplemental material). A subset of 
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FIG 4 Conservation of attenuation patterns in the 5' UTRs of whiB7. Transcription initiates at conserved primary TSSs located -350 nt upstream of the whiB7 
start codons and peaks in both species within the indicated upstream ORFs (uORFs). Transcriptional attenuation then occurs at the 3' ends of the uORFs, 
proximal to a conserved stem-loop structure followed by a U-rich stretch, predicted to function as a rho-independent terminator (RIT). 



differences, however, maybe attributable to mutations giving rise 
to the generation or disruption of — 10 box motifs, resulting in the 
emergence or deletion of TSSs. In two recent studies, SNP-driven 
promoter differences were found to account for over 10% of tran- 
scriptional differences between clinical isolates of M. tuberculosis 
(42) and to give rise to strain-specific asRNA transcripts in M. bo- 
vis (43). 

A total of 52 of 219 orthologs which have previously been re- 
ported to be differentially expressed between the species (44, 45) 
are associated with SNP-driven promoter differences (see Ta- 
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FIG 5 Overlap of the TSS maps of M. bov/s and M. tuberculosis. For each TSS 
category, the percentage of potential overlap achieved is indicated. The overlap 
of pTSS is highest, with a majority of asTSSs and iTSSs also shared. 



ble S5 in the supplemental material). For example, whiB6, encod- 
ing a putative oxidoreductase, has been shown to be highly ex- 
pressed in M. bovis relative to M. tuberculosis (44, 45). Closer 
inspection of our TSS map shows a consensus —10 box motif 
present upstream of the gene in M. bovis, while an insertion event 
has disrupted this motif in M. tuberculosis. Overall, we identified 
clear correlations between SNP driven-promoter activity and 
published transcriptional differences between M. bovis and M. tu- 
berculosis, which will assist in interpreting the molecular basis for 
their broader phenotypic differences. 

DISCUSSION 

Ours is the first study comparing primary transcriptomes across 
mycobacterial species, and we show that a high level of conserva- 
tion of 5' UTRs within CPT orthologs has been maintained be- 
tween M. marinum and M. bovis, indicating the potential for a 
conserved regulatory architecture. Where deviations are observed 
in 5' UTR length, we point toward MIRUs as a potential contrib- 
utor to transcriptional regulation in the M. tuberculosis complex. 
We find evidence of evolutionarily conserved transcriptional at- 
tenuation centered on a uORF within the 5' UTR of whiB7 and 
highlight the role that it may play in w/jz'B7-mediated antibiotic 
resistance. The —10 box is the most prominent arbiter of tran- 
scriptional initiation, but we also identify a conserved signal at the 
—35 region of a subset of mycobacterial promoters. Finally, our 
study results point to SNP-driven promoter differences as a major 
driver underlying the transcriptional differences between M. bovis 
and M. tuberculosis. 

Our study was designed to address the influence of differing 
modes of selection on the primary transcriptome, and recent stud- 
ies have predicted that there is at least some degree of noncon- 
served transcriptional noise within certain classes of NCRNA (9). 
Evolutionary theory posits that many distinctive genome features 
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have their genesis in nonadaptive forces that predominate in the 
environments of certain lineages (46). Our comparison of M. ma- 
rinum and M. bovis, whose genomes have been molded by distinct 
experiences of natural selection, illustrates that this can be clearly 
observed within their primary transcriptomes. Previous studies 
have shown that AT-enriching SNPs are more abundant than ex- 
pected in the genomes of members of the M. tuberculosis complex 
(35). 

It would appear that such an accumulation of SNPs, ineffec- 
tively purged by natural selection, has set the stage for an over- 
abundance of promoter sites in the M. tuberculosis complex rela- 
tive to other generalist mycobacteria. Our study further indicated 
that where such sites exist, the majority are utilized. An underrep- 
resentation of nonfunctional promoter motifs has previously been 
correlated with translational efficiency and increased growth rate 
(47), suggesting that an abundance of motifs could have a negative 
impact on metabolic activity and growth rate. Strong intragenic 
— 10 box motifs could theoretically cause the ribosome to stall at 
internal sites within genes or could bind RNAP to maintain a local 
abundance of RNAP molecules close to the pTSS. 

The extent to which this noisy transcriptome implies increased 
regulation by NCRNA is difficult to determine, as our findings 
argue for the primary transcriptome as a milieu of adaptive and 
nonadaptive components. However, SNP-driven promoter usage 
clearly is an important contributor to transcriptional diversity 
within the M. tuberculosis complex and recent studies suggest that 
double-stranded RNA (dsRNA) plays a major role in gene regula- 
tion (48). If this holds true for mycobacteria, then members of the 
M. tuberculosis complex clearly have a greater reservoir from 
which to initiate transcription than closely related species. Prob- 
lematically, few reliable methods have emerged for identifying 
functional NCRNAs and even highly expressed antisense tran- 
scripts have been shown to display little evolutionary conserva- 
tion. Our report shows that this dearth of conservation affects all 
nonprimary classes of TSS, making prediction of in silico function 
intrinsically difficult. We find that, although a core of intergenic 
sRNAs is shared, many are associated with 5' UTRs or may repre- 
sent misannotated peptides. 

The picture emerges of a transcriptome highly conserved in 
parts, coupled with the potential to generate immense transcrip- 
tional diversity. The simplest explanation is that those regions that 
are hypervariable are passively accumulated, arising from the re- 
duction in the population size of the M. tuberculosis complex rel- 
ative to closely related species of nontuberculous mycobacteria 
(NTM). An alternative interpretation requires that selective forces 
are operating to increase the level of transcriptional complexity in 
the M. tuberculosis complex in excess of that of their closest rela- 
tives. 

Our report predicts that many unique features of a pathogen's 
primary transcriptome are merely byproducts of its population 
history rather than active participants in adaptation to pathoge- 
nicity. In any event, consideration of the contributions that non- 
adaptive neutral forces make in generating transcriptional and 
regulatory networks seems a necessary prerequisite in order to sort 
the signal from the noise. 

MATERIALS AND METHODS 

Additional details of the experimental procedures are provided in Docu- 
ment SI in the supplemental material. 



Cell culture and RNA isolation. M. bovis AF2122/97 was grown in 
roller bottle culture at 37°C in Middlebrook 7H9 medium supplemented 
with 10% albumin- dextrose catalase (ADC; Difco), 0.05% Tween, and 
10 mM pyruvate. M. marinum M was grown in roller bottle culture at 
30°C in Middlebrook 7H9 medium supplemented with 10% albumin- 
dextrose catalase (ADC; Difco), 0.05% Tween, and 0.5% glycerol. 

Total RNA samples were then isolated using an RNeasy Plus Minikit 
(Qiagen) in accordance with the manufacturer's instructions. Samples 
were quantified using a Nanodrop 1000 spectrophotometer (Thermo Sci- 
entific), and the quality of the RNA was assessed by visualization on aga- 
rose gels and by measuring the sample's A 260 /A 280 ratio (> 1.8 required). 

cDNA library preparation. TSS sequencing (TSS-seq) cDNA libraries 
were constructed and sequenced as single-ended, strand-specific reads on 
a single lane using an Illumina HiSeq 2000 machine by vertis Biotechnolo- 
gie AG. Standard RNA-seq libraries were also prepared for an Illumina 
Genome Analyzer IIx platform using RNA from exponential-phase cul- 
tures of both species. 

Read mapping and statistical analyses. Reads were aligned to the ref- 
erence genomes ofM. bovis AF2 122/97 (NC_002945.3) andM. marinum 
M (NC_010612.1) using Bow tie (49) with the "-best" flag, and biological 
replicates were processed separately. 

The TSS-seq (TEX + and TEX - ) libraries yielded an average of 12.5 
million reads per bioreplicate of M. bovis and 13.0 million reads per 
bioreplicate of M. marinum, 80% and 77% of which mapped to the re- 
spective genomes. On average, 63% and 5 1 % of the mapped reads, respec- 
tively, corresponded to rRNA and were discarded from further analyses. 
The standard RNA-seq libraries yielded 5.8 million reads on average for 
M. bovis and 16.6 million reads on average for M. marinum, 71% and 69% 
of which mapped to their respective genomes following removal of rRNA 
reads (-5% and -24%). 

Interspecies TSS and promoter comparison. Orthologous genes 
shared between M. marinum and M. bovis were retrieved from the Effi- 
cient Database framework for Comparative Genome Analyses (EDGAR) 
(50). Genome-wide TSS maps for both species were generated by assign- 
ing all TSSs to at least one of five categories based on their locations 
relative to annotated genes, in an approach similar to that of Sharma et al. 
(7). Each TSS was used to define a promoter region 50 nt in length imme- 
diately upstream. TSSs associated with orthologous genes were assigned to 
one of three categories, following a classification scheme used by Kim et al. 
(11). Promoters were extracted and aligned (51) to the corresponding 
orthologous genomic regions by the use of ClustalW2. Circular genome 
representations were created using Circos (52). 

Nucleotide sequence accession number. The sequencing data de- 
scribed in this report have been submitted to the NCBI gene expres- 
sion omnibus (GEO) under accession number GSE51881. 
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